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INTRODUCTION 



The plant seed is not only an organ of propagation and dis- 
persal but also the major plant tissue harvested by humankind. 
The amount of protein present in seeds varies from ~10% (in 
cereals) to ~40% (in certain legumes and oilseeds) of the dry 
weight, forming a major source of dietary protein. Although 
the vast majority of the individual proteins present in mature 
seeds have either metabolic or structural roles, all seeds also 
contain one or more groups of proteins that are present in high 
amounts and that serve to provide a store of amino acids for 
use during germination and seedling growth. These storage 
proteins are of particular importance because they determine 
not only the total protein content of the seed but also its qual- 
ity for various end uses. For example, the low content of lysine, 
threonine, and tryptophan in various cereal seeds and of cys- 
teine and methionine in legume seeds is due to the low 
proportions of these amino acids in the major storage proteins 
and may limit the nutritional quality of the seeds for monogas- 
tric animals. In the case of wheat, the storage proteins form 
the gluten fraction, whose properties are largely responsible 
for the ability to use wheat flour to make bread, other baked 
goods, and pasta. These properties are not shared by the stor- 
age proteins of other cereals. 

In this article, we provide a broad overview of the structures 
and properties of the seed storage proteins of the major crop 
plants, emphasizing their biological roles, their evolutionary 
origins, and their modes of synthesis and deposition. Although 
some storage proteins may also play roles in defense or me- 
tabolism, we focus on those that function solely for storage. 



Characteristics of Seed Storage Proteins 

Despite wide variation in their detailed structures, all seed stor- 
age proteins have a number of common properties. First, they 
are synthesized at high levels in specific tissues and at cer- 
tain stages of development. In fact, their synthesis is regulated 
by nutrition, and they act as a sink for surplus nitrogen. How- 
ever, most also contain cysteine and methionine, and adequate 
sulfur is therefore also required for their synthesis. Many seeds 
contain separat groups f storage prot ins, some of which 
are rich in sulfur amino acids and others of which are poor 
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in them. The presence of these groups may allow the plant 
to maintain high levels of storage protein synthesis despite vari- 
ations in sulfur availability. The strict tissue specificity of seed 
storage protein synthesis contrasts with that of tuber storage 
proteins, which may be synthesized in vegetative tissues un- 
der unusual conditions (for example, in vitro or after removal 
of tubers) (Shewry, 1995). A second common property of seed 
storage proteins is their presence in the mature seed in dis- 
crete deposits called protein bodies, whose origin has been 
the subject of some dispute and may in fact vary both between 
and within species. Finally, all storage protein fractions are 
mixtures of components that exhibit polymorphism both within 
single genotypes and among genotypes of the same species. 
This polymorphism arises from the presence of multigene 
families and, in some cases, proteolytic processing and 
glycosylation. 



Classification of Seed Storage Proteins 

Because of their abundance and economic importance, seed 
storage proteins were among the earliest of all proteins to be 
characterized. For example, wheat gluten was first isolated in 
1745 (Beccari, 1745), and Brazil nut globulin was crystallized 
in 1859 (Maschke, 1859). However, the detailed study of seed 
storage proteins dates from the turn of the century, when 
Osborne (1924) classified them into groups on the basis of their 
extraction and solubility in water (albumins), dilute saline 
(globulins), alcohol/water mixtures (prolamins), and dilute acid 
or alkali (glutelins). The major seed storage proteins include 
albumins, globulins, and prolamins. 



2S ALBUMIN STORAGE PROTEINS 



The 2S albumins were initially defined as a group on the ba- 
sis of their sedimentation coefficients (S^w) of ~2 (Youle and 
Huang, 1981). They are widely distributed in dicot seeds and 
hav been most wid ly studied in the Cruciferae, notably oil- 
seed rape (in which th y ar call d napins) and Arabidopsis. 
The napins consist of two polypeptide chains with M t values 
of ~9000 and 4000, which are linked by int rchain disulfid 
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Figure 1. Schematic Structures of Members of the Cereal Prolamin Superfamily, 

The cereal prolamin superfamily comprises the 2S albumins of diccts, the prolamins of the Triticeae, oats, and rice, and the 0» and y-zeins of 
maize. Three conserved regions (A, B, and C) are present in all except C hordein, although their boundaries are often poorly defined. These 
three regions also show homology with each other and contain cysteine residues that may be conserved within or between the different groups 
of proteins. For example, the 2S albumins shown ail contain eight cysteine residues that are conserved in terms of context and position, including 
Cys-Cys and Cys-Xaa-Cys motifs, which, are present in many of the other proteins. Based on data of Sharief and ti (1982), Crouch et aL (1983), 
Bartels et al. (1986), Boronat et at. (1986), Higgins et-al. (1986), Lilley and ingfis (1986), Pedersen et ai. (1986), Halford et ai. (1987), Egorov (1988), 
Chestnut et at. (1989), Masumura et aL (1989), Gayier et al. (1990), Irwin et ai. (1990), Entwistie et ai. (1991), Kortt et al. (1991), and Shewry et 
al. (1993). The sunflower afbumin structure and the disulfide structure of SFA8 are unpublished results of A. Tatham, J. Napier, R. Fido, P. Thoyts, 
T. Egorov, and P. Shewry. 



bonds (Ericson et al M 1986). They are synthesized as single 
precursor proteins that are proteolytically cleaved with the loss 
of a linker peptide and short peptides from both the N and 
C termini (Figure 1a; Crouch et al., 1983; Ericson et al., 1986). 



This appears to be the most typical 2S albumin structure: simi- 
lar heterodimeric proteins are present in species as diverse 
as pumpkin (Hara-Nishimura et aL, 1993), cotton (Galau et aL, 
1992), castor bean (Sharief and Li, 1982), and lupin (Lilley and 



Inglis, 1986). The presence f two interchain bonds has been 
directly demonstrated in 2S albumins from lupin (Figure 1b). 
Variant types of 2S albumin also occur. Those of pea appear 
to lack interchain disulfide bonds (Higgins et al., 1986), whereas 
the 2S albumins of sunflower remain uncleaved (Figures 1c 
and 1e; Kortt and Caldwell, 1990; Anisimova et al. t 1994). In 
addition, in sunflower and castor bean, some mRNAs encode 
two mature albumin proteins, each consisting of one or two 
subunits (Figures 1d and 1e; Allen et al., 1987; Irwin et al., 1990; 
P. Thoyts, J. Napier, M. Miilichip, T. Griffiths, A. Tatham, A. 
Stobart, and P. Shewry, unpublished results). 

Despite differences in their subunit structure and synthe- 
sis, all the 2S albumins are compact globular proteins with 
conserved cysteine residues (Figure 1). Although little is known 
about the detailed three-dimensional structures of 2S albu- 
mins, that of yellow mustard has been reported to contain 
~50% ct-helix, with little or no 0-sheet (Men6ndez-Arias et al., 
1987). The authors proposed a ring structure with tightly packed 
a-heiices, as suggested for zeins (see later discussion), but 
there is no experimental evidence for this structure. 

Much of the recent interest in 2S albumins has focused 
on their exploitation in genetic engineering. Most notably, 
Altenbach et al. (1987, 1992) have used the 2S albumin of Bra- 
zil nut, which is rich in methionine (Youle and Huang, 1981), 
to increase the methionine content of tobacco seeds by up 
to 30%, and Higgins and co-workers have used the methionine- 
rich sunflower 2S albumin SFA8 to increase the methionine 
content of forage grasses (Tabe et al., 1993). In addition, the 
2S albumins of Arabidopsis have been used as "hosts" for the 
synthesis of biologically active peptides, including the pen- 
tapeptide Leu-enkephalin (Vandekerckhove et al., 1989) and 
a 28-residue antibacterial peptide from Xenopus (De Clercq 
et al., 1990; Krebbers et al., 1993). In this work, the peptide 
was expressed as an insert within a variable loop region of 
the 2S albumin and then isolated by enzymatic cleavage. Al- 
though yields in oilseed rape equivalent to 1 kg of a 25-residue 
peptide per hectare were achieved (Krebbers et al. , 1993), the 
commercial viability of the work is uncertain. 



PROLAMIN STORAGE PROTEINS 



Whereas the 2S albumin and globulin (see later discussion) 
storage proteins are widely distributed in flowering plants, the 
prolamins are restricted to one family, the grasses. These in- 
clude the major cereals, in which prolamins usually account 
for approximately half of the total grain nitrogen. Exceptions 
to this general rule are oats and rice, in which the major stor- 
age proteins are 11S globulin-like and prolamins are present 
at low levels (~5 to 10% of the total grain protein). 

Prolamins are traditionally recognized as a group on the ba- 
sis of their solubility in alcohol/water mixtures (usually 60 to 
70% [v/v] ethanol or 50% [v/v] propan-1-ol) and their high levels 
of glutamine and prolin . However, comparisons of amin acid 
sequences have shown that this d finrtion must be wid n d 
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to include components that are ins luble in aqueous alcohols 
in the native stat due to the presence of interchain disulfide 
bonds and to recognize that all prolamins, even those that ar 
insoluble in aqueous alcohols, are related, except for the 
a-zeins of maize (and their homologs present in related Pani- 
coid cereals). All other prolamins form a single group known 
as the prolamin superfamily. 



The Prolamin Superfamily of the IHticeae 

Our understanding of this storage protein family stemmed ini- 
tially from studies of the temperate cereals of the tribe Triticeae: 
barley, wheat, and rye. The prolamins of all three species are 
highly polymorphic mixtures of components whose M r values 
range from ~30,000 to 90,000. These prolamins are classified 
into three groups (Miflin et al. t 1983)- the S-rich, S-poor, and 
high molecular weight (HMW) prolamins— based on their 
amino acid sequences. Typical structures of the three types 
of prolamin are summarized in Figures 1f, 1g, and 1h. 

The S-rich prolamins are the quantitatively major prolamin 
group in all three species, accounting for ~80 to 90% of the 
total prolamin fractions. They include polymeric (that is, with 
interchain disulfide bonds) and monomeric (with intrachain di- 
sulfide bonds) components and consist of at least two families 
in each species: the B and y-hordeins of barley; two types of 
y-secalin of rye; and the a-gliadins, rQliadins, and low molec- 
ular weight (LMW) glutenin subunits of wheat. Their amino acid 
sequences consist of two separate domains: an N-terminal do- 
main composed of repeated sequences, and a nonrepetitive 
C-terminal domain (see Figure 1f). The repetitive domain con- 
sists of tandem or interspersed repeats based on one or two 
short peptide motifs rich in proline and glutamine; this struc- 
ture accounts for the high proportions of these two residues 
in the protein as a whole. For example, the repetitive domain 
of the y-gliadin shown in Figure 1f is based on a Pro-Gln-Gln- 
Pro-Phe-Pro-GIn heptapeptide. This domain forms a second- 
ary structure containing ^-reverse turns and poty-L-proline II 
helix (Tatham et at., 1990), as discussed later for the S-poor 
prolamins. In contrast, the nonrepetitive domain appears to 
have a globular structure rich in a-helix. This domain also con- 
tains most or all of the cysteine residues. Eight cysteines are 
present in the monomeric y-gliadin, which form four intrachain 
disulfide bonds (Figure 1f). Six of these cysteine residues are 
also present in the monomeric a-gliadins (based on sequence 
context); additional "unpaired" cysteine residues present in the 
polymeric LMW glutenin subunits may be responsible for poly- 
mer formation (see Shewry et al., 1993). 

The S-poor prolamins include C hordein of barley (Figure 
1g), the o-secalins of rye, and the o-gliadins of wheat (Kasarda 
et al., 1983). Several genes encoding o-secalins and C hor- 
deins have been isolated (Entwistle et al., 1991; Hull et al., 1991; 
Sayanova, 1993). In ail cases, the encoded proteins consist 
almost entirely of repeats of the octapeptide motif Pro-Gln-Gln- 
Pro-Phe-Pro-Gln-GIn that are flanked at th N-terminal sid 
bysh rtuniqu sequ nces of 12 residues and at the C-t rminal 
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side by short unique sequences of either six (C hordeins; see 
Figure 1g) or four (co-secalin) residu s. The S-poor prolamins 
generally lack cysteine r sidues and therefore cannot form 
oligomers or polymers. Structural studies of C hordein indi- 
cate that the highly conserved repetitive primary structure 
results in a similarly conserved supersecondary structure. This 
is a loose spiral based on elements of (J-turn and poly-L-proline 
II helix, the whole molecule forming a "stiff worm-like coil" of 
""70 nm in length (I'Anson et al., 1992). 

The HMW prolamins are typified by the HMW subunits of 
wheat glutenin, which have been studied in detail because 
of their putative role in determining the elasticity, and hence 
the bread-making performance, of wheat doughs (Payne, 1987; 
Shewry et at., 1989, 1992). Extensive repeated sequences are 
present, flanked by nonrepetitive N- and C-terminal domains 
(Figure 1h). The repeated sequences are based on the motifs 
Gly-Tyr-Tyr-Pro-Thr-Ser-Proor Leu-Gln-GIn, Pro-Gly-Gln-Gly-Gin- 
Gln, and, in some subunits only, Gly-Gln-Gln. Differences in 
the number of repeated peptides are largely responsible for 
variation in HMW subunit size. 

Although the repeated sequences present in the HMW 
subunits are not related to those in the S-poor prolamins, they 
appear to adopt a similar spiral supersecondary structure, al- 
though one that is more compact because it includes P-turns 
but not poly-L-proline II structure. The net result is a rod-shaped 
molecule (Field et al., 1987), which has been imaged directly 
by scanning probe microscopy (Miles et al., 1991). As in the 
S-poor prolamins, cysteine residues are largely restricted to 
the nonrepetitive domains (Figure ih). These domains appear 
to be globular (being rich in a-helix), with the cysteine residues 
allowing the formation of an elastic network stabilized by inter- 
chain disulfide bonds. 



Evolutionary Relationships among the S-Rich, S-Poor, 
and HMW Prolamins 

The three groups of prolamins present in the Triticeae all con- 
sist of at least two discrete domains, one of which is based 
on repeated sequences. More detailed comparisons show that 
the prolamins are likely to have evolved from a single ances- 
tral protein. Comparisons of the nonrepetitive domains of a 
range of S-rich prolamins (Kreis et al., 1985a, 1985b; Kreis and 
Shewry, 1989) show that all contain three conserved regions 
of between 20 and 30 residues. These regions, designated 
A, B, and C (Figure 1f), contain most of the conserved cys- 
teine residues and are also related to each other, indicating 
that they are likely to have originated from the triplication of 
a short ancestral domain. Insertion of additional variable 
regions (Ij to U) and of repeated sequences at the N-terminal 
side of region li would have given rise to the range of present- 
day S-rich prolamins. Short regions related to A, B, and C are 
also present in the HMW prolamins, although in this case 
regions A and B ar in th N-terminal domain and regi n C 
is in the C-terminal domain (Figure 1h). Therefore, these pro- 
teins are likely to have evolved from the same ancestor as did 



the S-rich prolamins, although unrelated repeated sequences 
have been insert d between regions B and C. 

The S-poor prolamins are also clearly related to the S-rich 
prolamins in that their repetitive sequences are based on simi- 
lar proline- and glutamine-rich peptide motifs. For example, 
the heptapeptide and octapeptide motifs present in y-gliadin 
and C hordein (Figure 1) differ in only a single glutamine resi- 
due. The S-poor group is hypothesized to have evolved from 
the S-rich prolamins by further amplification of the repeated 
sequences and deletion of most of the nonrepetitive domain 
that contains regions A, B, and C (Kreis et al., 1985a, 1985b; 
Kreis and Shewry, 1989). 



The Prolamln Superfamily in Other Species 

Prolamins related to those present in the Triticeae are also pres- 
ent in a range of other cereals. These include oats, in which 
the avenins contain regions A, B, and C together with two blocks 
of repeats rich in proline and glutamine (Figure 1i; Egorcv, 1988; 
Chesnut et al., 1989), and rice. The prolamins of rice consist 
of three groups of small proteins (Kim and Okita, 1988a, 1988b; 
Masumura et al., 1989, 1990). Although these do not contain 
repeated sequences, they appear to be related to one another 
and to the prolamins of the Triticeae. For example, the sulfur- 
rich M T 10,000 prolamins shown in Figure 11 appear to con- 
tain vestiges of regions A, B, and C. 

The prolamins of maize, known as the zeins, and of related 
Pantcoid cereals such as sorghum, pearl millet, and Jotfs tears, 
fall into four groups, three of which belong to the prolamin su- 
perfamily. In maize, these are the (*-, y-, and 6-zeins. The 0-zeins 
(Pedersen et al., 1986) and y -zeins (Boronat et al., 1986) both 
contain regions related to A, B, and C (Figures 1j and 1k). The 
6-zeins do not contain repeats or any other distinguishing fea- 
tures, but homology with the prolamin superfamily can be 
inferred from some sequence identity with the 2S albumin of 
Brazil nut (Kirihara et al., 1988; see later discussion). All three 
of these groups of zeins are rich in cysteine and/or methio- 
nine, residues deficient in the a-zeins. 



The 2S Albumins Are Also Related to the 
Prolamin Superfamily 

The 2S albumins also contain three conserved regions related 
to regions A, B, and C. These regions contain the eight con- 
served cysteine residues present in most 2S albumins (see 
Figures 1a to 1e), with region A and regions B and C corre- 
sponding to the smalt and large subunits, respectively, of the 
heterodimeric 2S albumins (for example, napin and conglutin 
6; Figures 1a and 1b). The absence of repeated sequences 
and the widespread distribution of 2S albumins in dicots (and 
even in ferns; Rodin and Rask, 1990) may indicate that th y 
are similar to th ancestral protein of the prolamin superfamily, 
alth ugh this would have lacked the proteolysis site betw n 
regions A and B. 
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The a-Zeins ! Maize 

The a-zeins account for ^75 to 80% of the total prolamins in 
maize and are classified into two groups with slightly different 
M r (M9,000 and ~22,OO0). They have similar structures, con- 
sisting of unique N- and Oterminal domains flanking repeated 
sequences (Marks and Larkins, 1982; Pedersen et al. t 1982; 
Marks et al M 1985). Although the latter are generally consid- 
ered to contain blocks of ~20 residues, they are highly 
degenerate, with no clear consensus motif. There is no evi- 
dence of homology with the repeated sequences present in 
other prolamine and the unique N- and Oterminal sequences 
do not appear to be related to any other protein. The size differ- 
ence between the M t 19,000 and M f 22,000 zeins may result 
from variation in the number of blocks present in the repeti- 
tive domains (nine and 10, respectively) or from the Insertion 
of a loop region of ~20 residues in the C-terminal domain of 
the M t 22,000 proteins. 

The precise structure adopted by the a-zeins is still uncer- 
tain. Whereas a range of biophysical studies demonstrates that 
they have extended conformations when in solution (Tatham 
et ai. t 1993), they may adopt a more compact conformation 
when present in the hydrated solid state, that is, in protein bod- 
ies (see later discussion). For example, Argos et af. (1982) 
proposed that a-zeins form an antiparaliel ring of nine a-heiices ( 
facilitating packaging in the protein bodies. 



GLOBULIN STORAGE PROTEINS 

The globulins are the most widely distributed group of storage 
proteins; they are present not only in dlcots but also in monocots 
(including cereals and palms) and fern spores (Templeman 
et al M 1987). They can be divided into two groups based on 
their sedimentation coefficients (S^w)- the 7S vicilin-type 
globulins and the 11S legumin-type globulins. Both groups 
show considerable variation in their structures, which results 
partly from posHransIationai processing, In addition, both have 
nutritional significance in that they are deficient in cysteine 
and methionine, although 11S globulins generally contain 
slightly higher levels of these amino acids. The globulin stor- 
age proteins have been studied in most detail in legumes, 
notably peas, soybean, broad bean {Viola fete), and French 
bean (Phaseolus vulgaris). 



The 11S Globulins 

The 11S iegumins are the major storage proteins not only in 
most legumes but also in many other dicots (for example, bras- 
sicas, composite and cucurbits) and some cereals (oats and 
rice). The mature proteins consist of six subunit pairs that in- 
teract noncovalently. Each of these subunit pairs consists in 
turn of an acidic subunit of M t ~40,000 and a basic subunit 
of M f ~20,000, linked by a single disulfide bond. Each subunit 



pair is synthesized as a precursor protein that is proteolytic 
cally cleaved after disulfide bond formation. Legumins are not 
usually glycosylated, an exception being the 12S globulin of 
lupin (Ouranti et al., 1988). This contrasts with the 7S globu- 
lins (see later discussion). 

Although the 11S globulin of Brazil nut was one of the first 
proteins to be crystallized (Maschke, 1859), the crystals of this 
and other 11S globulins have generally been small and disor- 
dered and have failed to provide any details of protein structure. 
However, a recent study of edestin, an 1 1S globulin from hemp- 
seed, is more promising. Although the crystals showed some 
disorder, they exhibited enough symmetry so that some mea- 
surements could be made. These indicated that the subunits 
are arranged in an open ring structure, oriented alternately 
up and down, in a disk whose diameter is 145 A and whose 
thickness is ^90 A (Patel et al., 1994). 



The 7S Globulins 

7S globulins are typically trimeric proteins of M r ~150,000 to 
190,000 that lack cysteine residues and hence cannot form 
disulfide bonds. Their detailed subunit compositions vary con- 
siderably, mainly because of differences in the extent of 
post-translational processing (proteolysis and glycosylation). 
For example, the vicilin subunits of pea are initially synthesized 
as groups of polypeptides of M T ^47,000 and ^50,000, but 
post-translational proteolysis and glycosylation then give rise 
to subunits with M r values between 12,500 and 33,000 (Fig- 
ure 2; Gatehouse et aL, 1984; Casey et al., 1986, 1993). These 
subunits are difficult to purify and characterize, but molecular 
cloning allowed their origins and the sites of proteolytic cleav- 
age and glycosylation to be identified. 

The 7S globulins of R vulgaris and soybean differ from those 
of pea and V. fete in that glycosylation is more extensive but 
proteolysis does not occur. For example, the 7S phaseolin of 
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Figure 2. Schematic Diagram Shewing the Origin of the Pea Vicilin 
Subunits. 

Based on Gatehouse et al. (1984) and Casey et al. (1993). The Mr 19,000, 
Mr 13,500, and Mr 12,500716,000 subunits are shown in red, orange, 
and green, respectively. 
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Figure 3. Alignment ot phaseoiin and Glycinin 2 Subunit Sequences. 

The alignment of ^-phaseolin (French bean 7S globulin) and glycinin 2 (soybean 11S globulin) is based on the structure of phaseoiin (Lawrence 
et al., 1994). Two structurally similar units, A and A' (shown in yellow), have been defined by x-ray crystallography. Each unit consists of a ^-barrel 
with a *jeliy roll" motif followed by an a-heiical domain comprising three helices. Unit A is located in the acidic subunit of the 1 1S protein and 
unit A in the basic subunit. The blue areas flanking and separating regions A and A' correspond to regions of sequence homology that are not 
reflected in the secondary structures as determined by crystallography. The distribution of globally conserved residues in the 7S/11S alignment 
indicates the presence of four major insertions in the 11S protein (shown in red). Two of these are in unit A and one is in unit A', all falling within 
loop regions connecting structural elements. The fourth insertion is in the region of sequence homology between A and A', at the Oterminal 
end of the acidic subunit. Insertions and deletions of less than six residues are not shown. The signal peptide of the (Vphaseoiin, shown in green, 
has limited sequence homology with the N-terminal region of the mature glycinin 2 precursor. Based on data of Lawrence et aL (1994), 



R vulgaris consists of glycosylated subunits with M r values be- 
tween ^43,000 and 53,000 (Hall et aL, 1977; Boflini and 
Chrispeels, 1978). 

The three-dimensional structures of several 7S globulins 
have recently been determined using x-ray crystallography 
(Lawrence et al., 1990, 1994; Kb et al. , 1993). These show that 
the trimeric proteins are disk shaped, with diameters of ~90 
A and thicknesses of 30 to 40 A. 



11S and 7S Globulins Are Related 

Although the 7S and 1 1S globulins show no obvious sequence 
similarities, they do have similar properties, including the ability 
to form both trimeric and hexameric structures. In the case 
of the 73 globulins, the mature protein is trimeric, but it may 
undergo reversible aggregation into hexamers, depending on 
the ionic strength (Tnanh and Shibasaki, 1979). The mature 
11S globulin, by contrast, is hexameric but is initially assem- 
bled and transported through the secretory system as an 
intermediate trimer (Gatehouse et al., 1984; Muntz et al. f 1993). 
Therefore, it is not surprising that more sophisticated compar- 
isons have shown that the 11S and 7S globulin subunits are 
related in structure (Gibbs et a!., 1989; Lawrence et al., 1994). 
Such comparisons indicate that the basic (C-terminal) chain 
of the 11S legumins is related to the C-terminal region of the 
7S vicilihs. Lawrence et al. (1994) determined the x-ray crystal 
structure of 7S phaseoiin and established sequence homolo- 
gies between 7S/11S proteins. The homologies showed that 
the 11 S sequences, with four major insertions of sequence, 
can be aligned with 7S sequences. The authors further pro* 
posed that the 11 S legumins hav a tertiary structure similar 
to that ot the 7S yicilins and concluded that the 7S and 11S 
proteins evolved from a common ancestral protein (Figure 3). 



SYNTHESIS, ASSEMBLY, AND DEPOSITION 
OF SEED STORAGE PROTEINS 



All of the seed storage proteins discussed here are secretory 
proteins synthesized with a signal peptide that is cleaved as 
the protein is translocated into the lumen of the endoplasmic 
reticulum (ER). The subsequent events in storage protein pro* 
cessing are less clearly understood and may vary not only 
between different species but also within the same species, 
depending on the protein type and stage of development. The 
events that occur in the different compartments of the secre- 
tory system are discussed later and summarized in Table 1, 



Storage Protein Folding and Assembly In the ER 

Secretory proteins assume their folded conformations within 
the lumen of the ER, which is also the site of disulfide bond 
formation. Studies of other systems demonstrate that three 
types of ER lumenal proteins may assist in these processes. 
Molecular chaperones of the HSP70/BiP family may facilitate 
folding by binding transiently to the nascent polypeptides and 
may also prevent the formation of incorrect inter- or intramolecu- 
lar interactions. BiP-related proteins are present in developing 
endosperms of cereals such as rice (Li et at., 1993b), wheat 
(Glorint and Galili, 1991), and maize (Boston et al., 1991), and 
they accumulate in higher than normal levels in high-lysine 
maize mutants (Boston et al., 1991), possibly due to the pres- 
ence of incorrectly folded zeins. 

A second group of proteins, the peptidyl-prolyl ci$~trans 
isomerases (PPI), or cyclophtlins, of which one subclass (the 
S-cyclophilins) is resident in the ER lumen, may also assist 
folding by accelerating the isomerization of Xaa-Pro bonds, 
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Table 1. Processing and Assembly of Storage Proteins in the Secretory System 


Compartment 


Event 


2S Albumins 


Prolamins 


7S Globulins 


11S Globulins 


ER 


Co-translational insertion 


+ fl 


+ 


+ 


+ 




Signal peptide cleavage 


+ 


+ 


+ 


+ 




Chaperone-mediated folding (BiP) b 


NA 


NA 


NA 


NA 




S-S bond formation (PDI) 


NA 


+ c 




+ 




Isomeration of Xaa-Pro bonds (PPI) d 


NA 


NA 


NA 


NA 




W-Glycosylation 






+ 




Golgi 


Complex glycan addition 






+ 




Vacuole 


Propeptide processing 






NA 


+ f 




(cleavage at asparagine 












residues by proteases) 











a + , the process has been experimentally observed; - , the process has been looked for but not detected; NA, the process has not been ex* 
amined. 

b Although BiP is likely to interact transiently with every elongating nascent chain translocating across the ER membrane, its subsequent role 
in folding is likely to be protein dependent. 

c Because the majority of storage proteins form disulfide bonds, it is assumed that this reaction is catalyzed by protein disulfide isomerase 
(PDI) in the ER lumen. 

d Peptidyl-prolyl cis-trans isomerase (PPI) may be required for the folding in the ER of proline-rich proteins (Stamnes et al., 1992) such as the 
prolamins. 

0 Aspartic and thiol proteases have been characterized from 8. napus and castor bean, respectively. 
f Thiol proteases have been characterized from soybean and castor bean. 



which is a rate-limiting step in the folding of some proteins. 
The repetitive domains of cereal prolamins contain high lev- 
els of proline, and isomehzation of Xaa-Pro bonds might 
therefore be expected to limit their folding. This does not, how- 
ever, appear to be the case, at least in vitro. For example. 
C hordein contains ~30 mol % proline residues, ail of which 
appear to be in the trans configuration (Tatham et al., 1985). 
Nevertheless, it folds readily in vitro (Tamas et al., 1994). Our 
preliminary studies also show that the levels of cyclophilin- 
related transcripts decrease during the period of gluten pro- 
tein synthesis in developing wheat seeds (6. Grimwade, R. 
Freed man, P. Shewry, and J. Napier, unpublished results). 7S 
and 1 1S globulin subunits are also assembled in the ER, with 
the 7S globulins forming the mature trimers (Ceriotti et al., 
1991). 

Whether protein disulfide isomerase (PDI) catalyzes disul- 
fide bond formation in storage proteins also remains to be 
established, although Bulleid and Freedman (1988) showed 
that depletion of PDI from dog pancreas microsomes resulted 
in defective synthesis of disulfide bonds in a y-gliadin synthe- 
sized in vitro. PDI has also been shown to be associated with 
the ER in developing wheat endosperms (Roden et al., 1982), 
although the levels of PDI transcripts peak somewhat earlier 
than those of gluten proteins (B. Grimwade, R. Freedman, P. 
Shewry, and J. Napier, unpublished results). The assembly of 
some prolamins intodisulfide-stabilized polymers presumably 
also takes place in the ER, although there is no information 
available on how this occurs. 

N-linked glycosylate of the 7S phaseolin subunits also oc- 
curs in the ER lumen, pr bably as a cotranslati nal vent 



(Bollinietal., 1983; Vitaleetal., 1993). Phaseolin has two con- 
sensus N-glycosylation sites, one of which is always used, 
whereas the other, located closer to the C terminus, is used 
less frequently (Ceriotti et al., 1991). Wild-type phaseolin as- 
sembles into trimers in the ER, but trimerization is prevented 
if a C-terminal sequence of 59-amino acid residues is deleted. 
In this case, the protein monomer remains in the ER and be- 
comes glycosylated at the second site. Moreover, this 
assembly-defective protein interacts with BiP in an ATP- 
dependent manner, highlighting the role of BiP in binding to 
malfolded proteins (Pedrazzini et al., 1994). 



Storage Protein Transport and Protein Body Formation 
in Cereals 

Two routes of protein body formation appear to operate in de- 
veloping cereal endosperms, in one of which the protein body 
forms from the vacuole and in the other of which it forms from 
the ER. For example, the major storage proteins in oats and 
rice are related to the 11S globulins of dicots and appear to 
be transported from the ER lumen via the Golgi apparatus to 
the vacuole. The protein bodies then appear to form by frag- 
mentation of the vacuole. In contrast, the prolamins of rice 
(Krishnan et al., 1986) and maize (Larkins and Hurkman, 1978) 
appear to be retained within the lumen of the ER, which be- 
comes distended to form protein bodies. Thus, rice endosperm 
cells contain two populations of protein bodies, some of vacuo- 
lar origin (containing glutenins) and others of ER origin 
(containing pr lamins). 
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The situation appears to be more complicated in barley, 
wh at, and rye, with prolamins present in both ER-d rived and 
vacuolar protein bodi s. In the cas f wheat, th se may dif- 
fer in their protein content, as Rubin et al. (1992) suggested 
when they reported that glutenins are retained predominantly 
in ER-derived protein bodies, whereas gliadins are present in 
both types of protein body. In addition, Levanony et al. (1992) 
proposed that ER-derived protein bodies may subsequently 
fuse with the vacuoles, bypassing the Golgi apparatus. 

The mechanisms that determine whether a prolamin is trans- 
ported to the vacuole or retained in the ER are not known, 
because neither vacuolar targeting nor ER retention sequences 
have been identified. Li et al. (1993b) have suggested that rice 
prolamins are retained in the ER by interaction with BiP, which 
itself has a C-terminal ER retention sequence. Although the 
work of Li et aJ. (1993b) implies a "once only" binding of BiP 
to the emerging nascent polypeptide chain, it is now known 
that BiP binds to such chains in a sequential manner, "pull- 
ing" the protein into the ER lumen (Wickner, 1994). This 
indicates that a stable interaction of BiP with a nascent prola- 
min chain is very unlikely. In fact, the only clear examples of 
BiP binding to storage proteins are to malfolded or assembly- 
defective forms (D'Amico et al., 1992; Zhang and Boston, 1992; 
Pedrazzini et al., 1994). The expression of BiP-related tran- 
scripts is not coordinated with prolamin gene expression in 
developing endosperms of wheat (B. Grimwade, R. Freedman, 
P. Shewry, and J. Napier, unpublished results), and expres- 
sion of wheat y-gliadin in seed of transgenic tobacco plants 
does not alter the level of BiP transcripts (G. Richard, M. Turner, 
J. Napier, and R Shewry, unpublished results), which suggests 
that BiP is unlikely to be involved in prolamin retention in the ER. 

Studies of y-gliadin transport and retention in Xenopus oo- 
cytes seem to indicate that prolamin accumulation in the ER 
does not require any plant-specific factors (Altschuter et al., 
1993). Whereas a truncated form of the protein correspond- 
ing to the N-terminal domain accumulated in the ER to form 
protein body-like structures, a truncated form containing the 
C-terminal domain was secreted. The intact wild-type protein 
was also secreted, but at a lower rate than was the C-terminal 
domain. Although the prolamin repetitive domains could be 
responsible for ER retention by interacting with ER compo- 
nents, a simpler model is that interactions between the 
individual prolamin molecules result in the formation of insolu- 
ble aggregates that are not readily transported from the ER 
lumen. Such a model is supported by the observation of Li 
et al. (1993a) that rice prolamin mRNAs are segregated to a 
distinct region of the rough ER. Such segregation could allow 
aggregation of the prolamins to occur in localized parts of the 
ER, preventing widespread effects on ER integrity. 



Storage Protein Transport and Protein Body Formation 
In Dicots 

The 2S albumins and 7S/11S globulins of legumes and other 
dicots ar transport dviath Golgi apparatus t the vacuole, 



which fragments to form protein bodies. Despite several at- 
tempts, specific vacuolar targeting sequences have not been 
identifi d in these proteins (Muntz, 1989; Chrispeets, 1991; 
Saalbach et al., 1991; Chrispeels and Raikhel, 1992; Muntz 
et al., 1993). Instead, it is probable that one or more exposed 
regions of the correctly folded protein are recognized by the 
sorting machinery. 

The assembly of the 11S globulins appears to be a highly 
regulated event. The monomeric proteins are initially assem- 
bled in the lumen of the ER into trimers that are then 
transported from the ER to the storage vacuole, where they 
are assembled into their final hexameric form. This assembly 
process requires specific proteolytic cleavage of the subunits 
present in the trimers (Dickinson et al., 1989). Uncleaved 
trimers cannot assemble into hexamers in vitro unless they 
have been treated with papain. This cleavage does not cause 
the trimers to disassemble but may result in a conformational 
change that favors assembly into hexamers. The 1 1S globulin 
vacuolar processing protease has been characterized from sev- 
eral species and shown to recognize asparagine processing 
sites specifically. Scott et al. (1992) purified a soybean pro- 
tease that cleaves the trimeric 11S globulin proproteins. 
Hara-Nishimura et al. (1993) also purified an 11 S globulin pro- 
cessing peptidase from castor bean that displays similar 
processing specificity and also appears to be a thiol protease 
but is ungtycosylated. 

The 11S globulin processing peptidase of Hara-Nishimura 
et al. (1993) is also able to cleave 2S albumin precursors in 
vitro at their asparagine processing sites. A 2S albumin pro- 
cessing protease that cleaves 2S albumin proproteins from 
Arabidopsis in vitro has also been characterized (D'Hondt et 
al., 1993). Although this enzyme has the same specificity as 
that of the one purified by Hara-Nishimura et al. (1993), it is 
an aspartic protease rather than a thiol protease. The process- 
ing of 2S albumins does not appear to be required for their 
assembly, in contrast with the case of the 11S globulins. 



Storage Protein Packaging 

Little is known about how storage proteins are organized within 
protein bodies, although this organization may well be impor- 
tant in ensuring efficient use of storage space and facilitating 
mobilization of storage proteins during germination. Whereas 
prolamin and globulin storage proteins are present in sepa- 
rate protein bodies in rice (see previous discussion), they are 
located within the same protein bodies (although in separate 
phases of them) in other cereals. Prolamin inclusions are pres- 
ent in a globulin matrix in oats (Lending et al., 1989), and 
globulin (triticin) inclusions are present within a prolamin ma- 
trix in wheat (Bechtel et al., 1991). 

In leguminous plants, the 7S and 11S globulins appear to 
be in the same protein bodies with no spatial separation (see 
Harris et al., 1993). As discussed previ usly, their structures 
may facilitate efficient packaging. In many other dicots, such as 
pumpkin, sunflower, brassicas, and castor bean, 2S albumins 
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are stored tog th r with US globulins, but how these distinct 
types of storag protein are organized in the protein bodies 
is not known. In contrast, there is evidence that th different 
types of prolamins are spatially separated in the protein bod- 
ies of cereal endosperms. This is most clear in maize, where 
immunogold labeling has shown that the ct-zeins form the pro- 
tein body core, with p- and y-zeins at the periphery (Lending 
et al., 1992). The 8-zeins may also be present in the protein 
body core (Esen and Stetler, 1992). The evidence for spatial 
separation of prolamins in protein bodies of the Triticeae is 
(ess convincing, but Rechinger et al. (1993) proposed that the 
quantitatively minor S-rich y t - and Yrhordeins form a periph- 
eral layer surrounding a core of B hordeins (S-rich) and 
C hordeins (S-poor) in barley. The separation of different pro- 
teins in cereal protein bodies could result from the properties 
of the proteins themselves (for example, their ability to sepa- 
rate into separate phases) or from different patterns of 
deposition during protein body biogenesis. 



FUTURE DIRECTIONS 



Much of the recent work on seed storage proteins was per- 
formed to provide a basis for improving the nutritional and 
processing properties of crops using genetic engineering. The 
recent development of reliable transformation procedures for 
maize, small grain cereals (wheat and barley), and grain le- 
gumes means that this is now possible. 

A detailed understanding of storage protein structure and 
diversity is an important prerequisite for attempts to manipu- 
late quality because it indicates the extent to which the structure 
of the proteins can be manipulated without affecting their 
biological properties. For example, much of the work on en- 
gineering 2S albumins has been based on manipulation of 
a variable "loop" region identified by sequence comparisons 
(see previous discussion), whereas Wallace et at. (1988) used 
a structural mode for zein (Argos et at., 1982) to identify sites 
for the addition of lysine residues. We are using a similar ap- 
proach to develop novel wheats with improved bread-making 
quality, based on our understanding of the structures and prop- 
erties of individual gluten proteins. 

Transgenic plants are being used to develop improved lines 
for incorporation into plant breeding programs, but equally im- 
portant is their use as tools to explore aspects of seed protein 
folding, assembly, transport, and deposition. There are still 
many gaps in our knowledge of these processes. Current evi- 
dence indicates that ER lumenal proteins such as PDI and 
BiP may play a rote In storage protein folding and assembly 
in vitro and under abnormal circumstances (for example, in 
high-lysine mutants of maize), but we do not know whether 
they are required for storage protein accumulation under nor- 
mal conditi ns. 

The mechanisms of protein targ ting and protein body for- 
mation ar als obscur . Although it is well establish d that 
the 2S albumin, 7Sgl bulin, and 11Sgl bulin storage proteins 



are transported to the vacuole, non of these proteins appears 
to contain a cleavage-targ ting peptide like the N- or C-terminal 
peptides present n oth r proteins known to b transport d 
to plant vacuoles (Chrispeels and Raikhel, 1992). Defining the 
precise targeting mechanisms for these proteins may be diffi- 
cult if the sorting machinery in the Golgi apparatus recognizes 
one or more short peptide sequences exposed on the surface 
of the correctly folded proteins. However, identification of these 
sorting determinants is necessary for identification and isola- 
tion of the receptors within the secretory system that interact 
with them. 

Prolamin targeting and deposition in cereals are less well 
understood, with some components apparently retained in the 
ER and others transported to the vacuole. Expression of 
epitope-tagged mutant proteins will allow the products of trans- 
genes to be followed in a homologous background, whereas 
analysis of lines with increased or decreased levels of ER lu- 
menal proteins and components of the sorting machinery (for 
example, small GTP binding proteins and vesicle coat proteins, 
or COPs; Pelham, 1994) will undoubtedly add to our knowl- 
edge of prolamin targeting. However, one additional factor 
needs to be considered -the physical properties of the folded 
proteins (and in particular those that form disulfide-bonded 
polymers), which may result in an ER retention system based 
on solubility rather than specific recognition processes. The 
segregation of prolamin mRNAs to specific regions of the rough 
ER may ensure that this retention does not cause widespread 
disruption of ER integrity and function. 
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